Search CORE

CiteSeerX

Sketch *-metric: Comparing Data Streams via Sketching

Author: Anceaume Emmanuelle
Busnel Yann
Publication venue: HAL CCSD
Publication date: 01/07/2012
Field of study

12 pages, double colonnesIn this paper, we consider the problem of estimating the distance between any two large data streams in small- space constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the Sketch ⋆-metric, which allows to define a distance between updatable summaries (or sketches) of large data streams. An important feature of the Sketch ⋆-metric is that, given a measure on the entire initial data streams, the Sketch ⋆-metric preserves the axioms of the latter measure on the sketch (such as the non-negativity, the identity, the symmetry, the triangle inequality but also specific properties of the f-divergence). Extensive experiments conducted on both synthetic traces and real data allow us to validate the robustness and accuracy of the Sketch ⋆-metric

A Comparative Study of Rateless Codes for P2P Persistent Storage

Author: Anceaume Emmanuelle
Ribeiro Heverson
Publication venue: HAL CCSD
Publication date: 20/09/2010
Field of study

International audienceThis paper evaluates the performance of two seminal rateless erasure codes, LT Codes and Online Codes. Their properties make them appropriate for coping with communication channels having an unbounded loss rate. They are therefore very well suited to peer-to-peer systems. This evaluation targets two goals. First, it compares the performance of both codes in different adversarial environments and in different application contexts. Second, it helps understanding how the parameters driving the behavior of the coding impact its complexity. To the best of our knowledge, this is the first comprehensive study facilitating application designers in setting the optimal values for the coding parameters to best fit their P2P context

arXiv.org e-Print Archive

A framework for proving the self-organization of dynamic systems

Author: Anceaume Emmanuelle
Défago Xavier
Potop-Butucaru Maria
Roy Matthieu
Publication venue
Publication date: 09/11/2010
Field of study

This paper aims at providing a rigorous definition of self- organization, one of the most desired properties for dynamic systems (e.g., peer-to-peer systems, sensor networks, cooperative robotics, or ad-hoc networks). We characterize different classes of self-organization through liveness and safety properties that both capture information re- garding the system entropy. We illustrate these classes through study cases. The first ones are two representative P2P overlays (CAN and Pas- try) and the others are specific implementations of \Omega (the leader oracle) and one-shot query abstractions for dynamic settings. Our study aims at understanding the limits and respective power of existing self-organized protocols and lays the basis of designing robust algorithm for dynamic systems

Scientific Publications of the University of Toulouse II Le Mirail

arXiv.org e-Print Archive

Optimization results for a generalized coupon collector problem

Author: Anceaume Emmanuelle
Busnel Yann
Schulte-Geers Ernst
Sericola Bruno
Publication venue
Publication date: 01/01/2015
Field of study

We study in this paper a generalized coupon collector problem, which consists in analyzing the time needed to collect a given number of distinct coupons that are drawn from a set of coupons with an arbitrary probability distribution. We suppose that a special coupon called the null coupon can be drawn but never belongs to any collection. In this context, we prove that the almost uniform distribution, for which all the non-null coupons have the same drawing probability, is the distribution which stochastically minimizes the time needed to collect a fixed number of distinct coupons. Moreover, we show that in a given closed subset of probability distributions, the distribution with all its entries, but one, equal to the smallest possible value is the one, which stochastically maximizes the time needed to collect a fixed number of distinct coupons. An computer science application shows the utility of these results.Comment: arXiv admin note: text overlap with arXiv:1402.524

SQUARE: Scalable Quorum-Based Atomic Memory with Local Reconfiguration

Author: Anceaume Emmanuelle
Gramoli Vincent
Virgillito Antonino
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

International audienceInternet applications require more and more resources to satisfy the unpredictable clients needs. Specifically, such applications must ensure quality of service despite bursts of load. Distributed dynamic self-organized systems present an inherent adaptiveness that can face unpredictable bursts of load. Nevertheless quality of service, and more particularly data consistency, remains hardly achievable in such systems since participants (i.e., nodes) can crash, leave, and join the system at arbitrary time. The atomic consistency guarantees that any read operation returns the last written value of a data and is generalizable to data composition. To guarantee atomicity in message-passing model, mutually intersecting sets (a.k.a.quorums) of nodes are used. The solution presented here, namely SQUARE, provides scalability, load-balancing, fault-tolerance, and self-adaptiveness, while ensuring atomic consistency. We specify our solution, prove it correct and analyse it through simulations. \\ Les applications utilisées via internet nécessitent de plus en plus de ressources afin de satisfaire les besoins imprévisibles des clients. De telles applications doivent assurer une certaine qualité de service en dépit des pics de charge. Les systèmes distribués dynamiques capable de s'auto-organiser ont une capacité intrinsèque pour supporter ces pics de charge imprévisibles. Cependant, la qualité de service et plus particulièrement la cohérence des données reste très difficile à assurer dans de tels systèmes. En effet, les participants, ou noeuds, peuvent rejoindre, quitter le système, et tomber en panne de façon arbitraire. La cohérence atomique assure que toute lecture renvoie la dernière valeur écrite et la relation de composition la préserve. Afin de garantir l'atomicité dans un modèle à passage de message, des ensembles de noeuds s'intersectant mutuellement (les quorums) sont utilisés. La solution présentée ici, appelée SQUARE, est exploitable à grande échelle, permet de balancer la charge, tolère les pannes et s'auto-adapte tout en assurant l'atomicité. Nous spécifions la solution, la prouvons correcte et la simulons pour en analyser les performances

Uniform and Ergodic Sampling in Unstructured Peer-to-Peer Systems with Malicious Nodes

Author: Anceaume Emmanuelle
Busnel Yann
Gambs Sebastien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/12/2010
Field of study

ISBN: 978-3-642-17652-4International audienceWe consider the problem of uniform sampling in large scale open systems. Uniform sampling is a fundamental schema that guarantees that any individual in a population has the same probability to be selected as sample. An important issue that seriously hampers the feasibility of uniform sampling in open and large scale systems is the inevitable presence of malicious nodes. In this paper we show that restricting the number of requests that malicious nodes can issue and allowing for a full knowledge of the composition of the system is a necessary and sufficient condition to guarantee uniform and ergodic sampling. In a nutshell, a uniform and ergodic sampling guarantees that any node in the system is equally likely to appear as a sample at any non malicious node in the system and that infinitely often any nodes have a non null probability to appear as a sample at any honest nodes

Analysis of a large number of Markov chains competing for transitions

Author: Anceaume Emmanuelle
Castella François
Sericola Bruno
Publication venue: 'Informa UK Limited'
Publication date: 01/03/2014
Field of study

International audienceWe consider the behavior of a stochastic system composed of several identically distributed, but non independent, discrete-time absorbing Markov chains competing at each instant for a transition. The competition consists in determining at each instant, using a given probability distribution, the only Markov chain allowed to make a transition. We analyze the first time at which one of the Markov chains reaches its absorbing state. When the number of Markov chains goes to infinity, we analyze the asymptotic behavior of the system for an arbitrary probability mass function governing the competition. We give conditions for the existence of the asymptotic distribution and we show how these results apply to cluster-based distributed systems when the competition between the Markov chains is handled by using a geometric distribution

On the Power of the Adversary to Solve the Node Sampling Problem

Author: Anceaume Emmanuelle
Busnel Yann
Gambs Sébastien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2013
Field of study

International audienceWe study the problem of achieving uniform and fresh peer sampling in large scale dynamic systems under adversarial behaviors. Briefly, uniform and fresh peer sampling guarantees that any node in the system is equally likely to appear as a sample at any non malicious node in the system and that infinitely often any node has a non-null probability to appear as a sample of honest nodes. This sample is built locally out of a stream of node identifiers received at each node. An important issue that seriously hampers the feasibility of node sampling in open and large scale systems is the unavoidable presence of malicious nodes. The objective of malicious nodes mainly consists in continuously and largely biasing the input data stream out of which samples are obtained, to prevent (honest) nodes from being selected as samples. First we demonstrate that restricting the number of requests that malicious nodes can issue and providing a full knowledge of the composition of the system is a necessary and sufficient condition to guarantee uniform and fresh sampling. We also define and study two types of adversary models: an omniscient adversary that has the capacity to eavesdrop on all the messages that are exchanged within the system, and a blind adversary that can only observe messages that have been sent or received by nodes it controls. The former model allows us to derive lower bounds on the impact that the adversary has on the sampling functionality while the latter one corresponds to a more realistic setting. Given any sampling strategy, we quantify the minimum effort exerted by both types of adversary on any input stream to prevent this sampling strategy from outputting a uniform and fresh sample